Separate data/coherency caches in a shared memory multiprocessor system

ABSTRACT

The system and method described herein is a dual system directory structure that performs the role of system cache, i.e., data, and system control, i.e., coherency. The system includes two system cache directories. These two cache directories are equal in size and collectively large enough to contain all of the processor cache directory entries, but with only one of these cache directories hosting system-cache data to back the most recent fraction of data accessed by the processors, and the other cache directory retains only addreses, including addresses of lines LRUed out and the processor using the data. By this expedient, only the directory known to be backed by system cached data will be evaluated for system cache data hits.

BACKGROUND

1. Field of the Invention

The method and system relate to assuring that data stored in largeshared cache memory in a multi-processor environment and the data of themain memory are either identical or controlled so that stale and currentdata are not confused with each other.

2. Background Art

In a shared memory multiprocessor of the prior art, with a separatecache memory for each processor, it is possible to have many copies ofany one instruction operand: one copy in the main memory and one in eachindividual cache memory. When one copy of an operand is changed, theother copies of the operand must be changed also, i.e., cache coherence.Cache coherence is the discipline that ensures that changes in thevalues of shared operands are propagated throughout the system in atimely fashion.

There are three distinct levels of cache coherence:

-   -   1. Every write operation appears to occur instantaneously.    -   2. All processes see exactly the same sequence of changes of        values for each separate operand.    -   3. Different processes may see an operand assume different        sequences of values. (This is considered noncoherent behavior.)

In both level 2 behavior (where all processes see exactly the samesequence of changes of values for each separate operand ) and level 3behavior (where different processes may see an operand assume differentsequences of values, e.g., non coherent behavior), a program can observestale data.

In a large shared-memory multiprocessor, providing a system level cacheof the recently accessed contents of memory, along with an efficientmeans to handle system-wide cache coherency, can theoretically beaccomplished with a single system cache directory array by requiring thecontents of the respective processor level caches to be a subset of thesystem cache. Unfortunately, when the combined size of the processorcaches is sufficiently large, this subset rule can be can becomeimpractical because the resulting size of the system cache required towork effectively becomes too big.

While one possible solution to this is to maintain a single systemdirectory only partially backed by cache data, this proves difficult inpractice. This is because the logic must now evaluate which entries havedata and which do not when determining a system cache data hit.

SUMMARY OF THE INVENTION

As a solution to the problems inherent in maintaining a single systemdirectory only partially backed by cache data, where the logic mustevaluate which entries have data and which do not when determining asystem cache data hit, we provide a system with two complementary systemcache directories. That is, these directories are not associated to aparticular processor or processors, but are system cache directoriesaccessible to all of the processors.

These two directories are substantially equal in size and collectivelylarge enough to contain all of the processor cache directory entries,but with only one of these cache directories hosting system-cache datato back the most recent fraction of data accessed by the processors. Bythis expedient, only the directory known to be backed by system cacheddata will be evaluated for system cache data hits.

More specifically, the method and system uses two non-overlapping systemcache directories. These are substantially equally sized and arereferred to herein as superset and subset directories. As used herein,the following terms have the following meanings:

-   -   “Superset Directory” is a system cache directory backed by        system cache data, which is the superset of data available to        the processors or subset caches. That is, the superset directory        manages coherency and data.    -   “Subset Directory” is a system cache directory not backed by        system cache, that is, where only the processors or subset        caches have data not available in the system cache. In this way        the subset cache manages coherency only.

Together the two cache directories are sufficient in size to contain theentries of all underlying processor cache directories, together with thesystem cache.

The total cache is as large as practical given such factors as fastaccess times and reasonable chip area, and map the most recent fractionof the system cache directory entries into the superset cache.

The two caches collectively behave both as a data cache and a systemwide coherency cache. As a data cache, the structure is used for recentmemory data needed by the processors. As a system wide coherency cache,the structure is used as a system wide coherency controller. Thecoherency controller maintains a record of which processors in themulti-processor have copies of which portions of memory and in whatstate they are in.

The system and method described herein is a dual system directorystructure that performs the role of system cache, i.e., data, and ofsystem control, i.e., coherency, when the size of the system cache isinsufficient to contain the contents of all of the underlying caches.This avoids the complexity of a single structure solution, with itsextra MRU/LRU complexity, area, and data backed hit detection. It alsoavoids the extra cache level overhead on the underlying caches to managecoherency, and even the absence of a system data cache.

An advantage in using these two symmetric directories is that theirentries are paired one-to-one, such that once an entry is identified forLRU-out of the superset directory, the respective subset entry toreceive this LRU-out entry is immediately identified, without requiringany special logic of its own.

Furthermore, a processor requesting a system lookup requires a scan ofthe superset directory only, rather than a lookup in a combineddirectory, which would require additional logic to distinguish betweendata-backed and data-less system cache entries.

As far as the design is concerned, the superset directory and systemcache are collectively what would be considered a traditional systemcache. By adding the subset directory to the mix, entries would normallybe LRUed out of the subset directory, as well as the respectiveprocessors would now be placed in the subset directory and left topersist in the processors. This improves overall performance.

THE FIGURES

Various aspects of our invention are illustrated in the Figures appendedhereto.

FIG. 1, denominated “Prior Art” illustrates a high level representationof a multi-processor system having only individual cache memories, eachsuch cache memory associated to a single processor.

FIG. 2 illustrates a high level representation of a multi-processorsystem have a system cache with a dual system directory structure thatperforms the role of system cache, i.e., data, and system control, i.e.,coherency.

DETAILED DESCRIPTION

The method and system of the invention uses two complementary caches,one cache for data and the other cache for coherency. These twodirectories are substantially equal in size and collectively largeenough to contain all of the processor cache directory entries, but withonly one of these cache directories hosting system-cache data to backthe most recent fraction of data accessed by the processors. By thisexpedient, only the directory known to be backed by system cached datawill be evaluated for system cache data hits.

The system and method described herein is a dual system directorystructure that performs the role of system cache, i.e., data, and systemcontrol, i.e., coherency, when the size of the system cache isinsufficient to contain the contents of all of the underlying caches.

FIG. 1 illustrates a system with a set of processors 101, 103. Each ofthe processors 101, 103 has its own set of associated caches 101A, 103A,which caches 101A, 103A contain copies of the recent instructions anddata of recent work performed on the associated processor. Theindividual, associated caches, 101A, 103A, are connected through a databus 111 to a system cache 115, interposed between the processors 101,103, and through bus 117 to the main memory 131.

To summarize the invention, both the superset 221 a and 221 b and subset223 a and 223 b directories collectively manage cache coherency—thesuperset directory 221 a and 221 b is also backed by data, and canreadily supply it to any requesting processor (this is what isrecognized as a “traditional data/coherency cache”).

The subset directory 223 a and 223 b is a low-overhead means of allowingprocessors 201 a through 201 h to retain a line LRU'd out of thetraditional/superset cache 221 a and 221 b. The subset directory 223 aand 223 b is not tied to the superset cache 221 a and 221 b, and it isnot actually a data cache but is a directory with memory addresses andprocessor addresses only. As data is LRUed out of a superset directory221 a and 221 b, its address is still retained in the subset directory223 a and 223 b.

Without the subset directory 223 a and 223 b, upon LRUing out of thesuperset cache 221 a and 221 b, a line is “required” to be invalidatedout of every processor 201 a through 201 h; i.e., if the L2 directorycan't manage the coherency of a line, then “no” processor may possessthe line. However now, the processors can retain these lines a whilelonger, as the subset cache 223 a and 223 b will continue to managelines that have been LRU'd-out of the superset cache 221 a and 221 b.

The processors 201 a through 201 h do not communicate directly with themain memory 231, but rather they communicate through the L2 controllogic 211 a and 211 b, the superset directory 221 a and 221 b, and theL2 Data Array 227 a and 227 b.

There is no data bus connected to the subset cache 223 a and 223 b. Onlycontrol lines do an address lookup, and only control lines have thecontrol means to invalidate lines out of the processors.

FIG. 2 shows the separate data-227 and-control 225 busses. The L2 block220 and 22 b containing the control logic, super-and-subset directoriesand data array is characterized as an “L2 Node”. Note a unidirectionalcontrol bus 225 from the superset directory 221 a and 221 b to thesubset directory 223 a and 223 b. This unidirectional control bus 225copies the directory entry being LRU'd out of the superset directory 221a and 221 b into the subset 223 a and 223 b directory. In oneembodiment, the superset directory 221 a and 221 b signals the controllogic 211 a and 211 b it's LRUing a line, which the control logic 211 aand 211 b will then write into the subset directory 223 a and 223 b.

In order to work efficiently there is provided a system control methodor system in place to control resource access, insure cache coherency,etc. This system, structure, or method can be single level ormulti-level, and while it is illustrated herein as a single levelsystem, structure, or method, the methods, systems, and structuresillustrated herein may be extended to multi-structure systems, methods,and structures.

The system is useful for providing cache coherence to a multiprocessorsystem having a dual system directory structure having two system cachedirectories performing the role of system cache for data, and systemcontrol for coherency. This is done by writing an entry to a supersetcache, and writing an address and a calling processor identifier for theentry to a subset cache, LRUing the entry out of the superset cache andretaining the address and calling processor identifier for the entry inthe subset cache. In this way system-cache data is hosted in the subsetcache to back the most recent data accessed by the processors. Thisenables evaluating the subset cache for system cache data hits, as wellas maintaining a record in the subset cache of which processors havecached which portions of memory they are actually using.

There are two parts of the system, structure, and method describedherein:

-   -   a. A System Cache. The system cache, also referred to herein as        a superset cache 221 a and 221 b, is a cache of all the recent        instructions and data of the processors under control of the        system. It performs two roles:        -   i). It resupplies data to a processor cache when such data            ages out or is otherwise removed from the processor cache.        -   ii) Provides data to other processors, for example, by other            system caches, when such data is to be used by more then one            processor.    -   b. System cache coherency is provided through subset cache 223 a        and 223 b, also referred to herein as a coherency cache. As a        general rule, memory can not be accessed every time it is needed        or changes. This is because the access time is too large        relative to the speed of the processors. Consequently, the        function of maintaining a single coherent view of memory to the        processors via their respective processor caches falls on the        system control method, system and structure described herein. By        maintaining a record of which processors have cached which        portions of memory they are actually using, the system control        method, system, and structure can take appropriate action when        another processor needs to access the same parts of memory.

If the total system cache 221 a, 221 b, 223 a, 223 b can be builtsufficiently large, i.e., to manage the system cache 221 a and 211 b andthe system cache coherency 223 a and 223 b, such as by requiring allprocessor cache contents to be part of the system cache 221 a, 221 b,223 a, 223 b contents (the “subset rule”) there would be no problem.But, if such a cache is too large for practical use, it becomesnecessary to redesign the system, or to not have a system cache, or tohave the subset rule, above, limit a significant portion of the of theprocessor caches available to the associated processors, or to have twoseparate system cache directories, one to manage the cache and one tomanage the cache coherency, with the associate complexity to make it allwork.

According to the method, system, and structure described herein, thereis provided a superset system cache directory 221 a and 223 b (where allprocessor cache contents are part of the system cache contents) inconjunction with a subset system cache directory not backed by systemcache, that is, where only the processors or subset caches have data notavailable in the system cache. In this way the subset cache managescoherency only, both the subset system cache directory and the supersetsystem cache directory being, in total, large enough to handle bothsystem cache coherency, together with a system cache of practical size,where only the superset directory (hosting the most recently accessedentries) has corresponding data in the system cache.

For purposes of illustration, consider a system cache, e.g., cache 115of FIG. 1, or caches 121 and 123 of FIG. 2, of size N=C*A entries, where

-   -   C represents the number of congruence classes, where each of the        congruence classes represents a set of addresses of memory        corresponding to one of C possible values.    -   A representats the associativity of each congruence class, that        is, the number of cache entries that can share the same address        mapping used to select the congruence class.

Typical computer cache designs, e.g., cache 115 in FIG. 1, have such astructure, along with a corresponding cache directory with a similarN=C*A structure, each directory entry representing the correspondingcache entry and containing such information as the memory address of thecache entry, the last processor to access the data, and whether the datahas been changes with respect to the memory contents. Lastly, some formof LRU (least recently used) logic is present for each congruence classto manage the entries within that congruence class from the LRU (leastrecently used) to MRU (most recently used).

In this context, the most prevalent uses of LRU logic are:

-   -   1. Update. When an address is looked up and found in a        directory, typically it is made MRU (Most recently used) entry,        displacing all those entries that stand in between it and the        Most Recently Used position.    -   2. Install. When an address is not found in a directory,        typically a place is cleared for it, this time choosing the LRU        (least recently used) entry, replacing it with the new entry,        then making the new entry the MRU entry.

Returning to the system cache, if N entries are insufficient to provideefficient system cache coherency, but, for example, 2*N entries aresufficient to provide efficient system cache coherency, then the systemcache is kept to N entries, but the system directory is doubled to 2*N,as illustrated by caches 121, 123 in FIG. 2, where the existingdirectory is now called a superset directory (data backed directory),and the newly added entries is the “subset” (data less, coherencydirectory).

This pair of cache directories, that is, the subset and the supersetcache directories, performs both updating and installing. When lookingup a particular address, both directories are accessed in parallel. Inaccordance with the invention described herein, the following changesare made:

First, if a new superset directory entry is installed for modificationand a hit for the same entry detected in the subset directory, thesubset directory entry will be invalidated as well as being invalidatedout of the processors identified by the subset directory as hosting theentry.

If a new superset directory is installed, and a hit is detected for thesame entry in the subset directory, the entry is invalidated out of thesubset directory. Additionally,

-   -   a) If the new entry is shared and the subset directory entry is        shared, the entry's state is merged into the superset directory        entry while leaving the entries untouched in the processors        identified by the subset director as hosting the entry.    -   b) If the new entry is to be modified and the subset directory        is shared, the entry/data is invalidated out of all processors        identified as having the entry by the subset directory.    -   c) If the subset directory entry is modified, appropriate        coherency management is performed as required by the specific        implementation.

If a new superset directory install requires an LRU-out of the supersetdirectory (and consequently the system cache), the entry is stillremoved from the system cache, but the superset directory entry is notinvalidated and is instead migrated into the subset directory.

-   -   a) If this superset-to-subset directory migration requires an        LRU-out of the subset directory as well as all processors        identified by the subset directory as hosting the entry.

In the case of data invalidated out of the processors as a result of asubset directory invalidation, whether either from a new supersetdirectory that is installed, and a hit is detected for the same entry inthe subset directory, or from a new superset directory install requiresan LRU-out of the superset directory, appropriate action is taken, e.g.,writing the entry back to main memory if it differs from the main memorycopy.

In a preferred exemplification of our invention, the superset and subsetdirectories are matched and no special logic is required to identify thetargeted subset directory receiving a migration from the supersetdirectory, that is, if an entry LRU's out of superset [C][A], itmigrates into subset [C][A]. For this reason only the superset directoryneeds or has LRU/MRU logic.

The system and method described herein is a dual system directorystructure that performs the role of system cache, i.e., data, and systemcontrol, i.e., coherency, when the size of the system cache isinsufficient to contain the contents of all of the underlying caches.That is, it is a separate data/coherency cache

While the invention has been described with respect to certain preferredembodiments and exemplifications, it is not intended to limite the scopeof the invention thereby, but solely by the claims appended hereto.

1. A multiprocessor system having a dual system directory structurehaving two system cache directories performing the role of system cachefor data, and system control for coherency.
 2. The multiprocessor systemof claim 1 wherein the two cache directories are equal in size andcollectively large enough to contain all of the processor cachedirectory entries.
 3. The multiprocessor system of claim 2 wherein oneof the cache directories hosts system-cache data to back the most recentdata accessed by the processors.
 4. The multiprocessor system of claim 3wherein the directory known to be backed by system cached data will beevaluated for system cache data hits.
 5. The multiprocessor system ofclaim 2 wherein one of the cache directories hosts system cachecoherency data.
 6. The multiprocessor system of claim 5 wherein thedirectory hosting system cache coherency data maintains a record ofwhich processors have cached which portions of memory they are actuallyusing.
 7. A method of providing cache coherence to a multiprocessorsystem having a dual system directory structure having two system cachedirectories performing the role of system cache for data, and systemcontrol for coherency, comprising writing an entry to a superset cache,and writing an address and a calling processor identifier for the entryto a subset cache, LRUing the entry out of the superset cache andretaining the address and calling processor identifier for the entry inthe subset cache.
 8. The method of claim 7 comprising hostingsystem-cache data in the subset cache to back the most recent dataaccessed by the processors.
 9. The method of claim 8 evaluating thesubset cache for system cache data hits.
 10. The method of claim 7comprising maintaining a record in the subset cache of which processorshave cached which portions of memory they are actually using.